Marvelous data

Examinando datos con pandas



In [85]:

    
speaker = {'name':'Mai Giménez', 
           'twitter': '@adahopper',
           'weapons': ['python', 'bash','C++ ']}

print('\n'.join(["{}: {}".format(k, v) for k,v in speaker.items()]))









    



name: Mai Giménez
weapons: ['python', 'bash', 'C++']
twitter: @adahopper



In [2]:

    
from IPython.display import Image
Image(filename='marvel_logo.jpg')









    Out[2]:

Marvel, es una editorial de cómics estadounidense fundada por Martin Goodman en 1939. Aunque la marvel tal y como hoy la conocemos data de 1961 con la publicación de Los cuatro fantásticos y otras historias de superhéroes creadas por Stan Lee, Jack Kirbi, Steve Ditko,...

Marvel publica a personajes archiconocidos como:

Spider-Man
X-Men
Captain America
Guardians of the Galaxy
...

[Wikipedia]

¡Y todos estos datos son nuestros!

Recopilar datos



In [3]:

    
from IPython.core.display import HTML
MARVEL_DEV_SITE = "http://developer.marvel.com/"
HTML("<iframe src={} width=800 height=600></iframe>".format(MARVEL_DEV_SITE))









    Out[3]:

Pandas time!

Pandas es una librería de código abierto, con licencia BSD, que permite trabajar eficientemente analizando datos en python.

A pandas se le da bien:

Estrucutras de datos eficientes (DataFrames) para trabajar con datos indexados.
Herramientas para leer y escribir datos eficientemente. Es capaz de trabajar con distintos formatos:
- Csv.
- Ficheros de texto.
- Microsoft Excel.
- Bases de datos SQL.
- HDF5 format.
- ...
Remodelado flexible y alternancia entre conjuntos de datos.
Selección inteligente basado en etiquetas, indexación compleja, selección de subconjuntos en grandes conjuntos de datos.
Se pueden insertar y borrar columnas: mutabilidad de los conjuntos de datos.
Agrupado y fusionados sencillo de conjuntos de datos.
Funciones para series de tiempos: gestiona eficientemente rangos de fechas.
...



In [4]:

    
PANDAS_DEV_SITE = "http://pandas.pydata.org/"
HTML("<iframe src={} width=800 height=600></iframe>".format(PANDAS_DEV_SITE))









    Out[4]:



In [5]:

    
import pandas as pd
import sys
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

%matplotlib inline

print("Versión de Python:     ", sys.version)
print("Versión de Pandas:     ", pd.version.short_version)
print("Versión de Numpy:      ", np.version.short_version)
print("Versión de Matplotlib: ", matplotlib.__version__)









    



Versión de Python:      3.3.4 (default, Jul 25 2014, 00:04:27) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
Versión de Pandas:      0.14.1
Versión de Numpy:       1.8.2
Versión de Matplotlib:  1.4.0

Leer los datos

Marvel sólo nos deja buscar hasta 100 personajes/cómics cada vez. Tenemos una libería para acceder directamente a la api de Marvel en python desarrollada por Garrett Pennington pymarvel en python 2 y está portada a python 3 en pymarvel3

Lo primero que deberíamos hacer es recoger información de las web y almacenarnoslas. Pero, a alguien más se le ha ocurrido eso, y no vamos a reinventar la rueda. @asamiller ha desarrollado una app en node.js que explora la api de marvel y almacena los datos usando Orches Orchestrate. Tenemos el código disponible en github.



In [6]:

    
from os.path import join, abspath, isfile 
from os import listdir, getcwd, pardir

MARVELOUSDB_PATH = join(abspath(join(getcwd(), pardir)),"marvelousdb","data")
MARVELOUSDB_CHARACTERS = join(MARVELOUSDB_PATH,"characters")
MARVELOUSDB_COMICS = join(MARVELOUSDB_PATH,"comics")



In [7]:

    
characters_json_db = [join(MARVELOUSDB_CHARACTERS,json_file) for json_file in listdir(MARVELOUSDB_CHARACTERS)]
comics_json_db = [join(MARVELOUSDB_COMICS,json_file) for json_file in listdir(MARVELOUSDB_COMICS)]
print("En MarvelousDB tenemos un backup de {} personajes y {} cómics".format(len(characters_json_db),
                                                                             len(comics_json_db)))









    



En MarvelousDB tenemos un backup de 1402 personajes y 30180 cómics

DataFrame

Un DataFrame es una estructura de 2 dimensiones con datos etiqueatados en columnas. Los datos que componen un dataframe pueden ser de distintos tipos. Piensa en un dataframe como si fuera una hoja de cáculo o una tabla SQL.

Puedes formar un dataframe usando:

Diccionarios 1D de ndarrays, listas, diccionarios o series (Pandas).
Una matriz 2D ndarray.
Otro dataframe
...

Al crear un dataframe, también puedes indicar los índices (etiquetas para las filas) y las columnas. Si no pasamos estas etiquetas como argumentos pandas creará un dataframe usando el sentido común.

En nuestro caso, leeremos todos los ficheros json y crearemos un DataFrame. Como tenemos información jerárquica en los ficheros json necesitamos normalizar los datos, pero pandas tiene funciones que lo hacen por nosotros.

Idiomatic



In [8]:

    
import json



In [9]:

    
json_to_dataframe = []
for json_file in characters_json_db:
    with open(json_file, 'r') as jf:
        json_character = json.loads(''.join(jf.readlines()))
        json_plain = pd.io.json.json_normalize(json_character)
        json_to_dataframe.append(json_plain)
        
characters_df = pd.concat(json_to_dataframe)

Non idiomatic



In [10]:

    
df = pd.concat([pd.io.json.json_normalize(json.loads(''.join(open(json_file,'r').readlines()))) 
                for json_file in characters_json_db])

Podemos realizar operaciones lógica sobre todos los elementos de un DataFrame, son operaciones vectoiales.



In [12]:

    
all(df == characters_df)









    Out[12]:





True



In [13]:

    
comics_df = pd.concat([pd.io.json.json_normalize(json.loads(''.join(open(json_file,'r').readlines()))) 
                       for json_file in comics_json_db if isfile(json_file)])

¿Y que pinta tiene un DataFrame?



In [14]:

    
characters_df.head()









    Out[14]:






  
    
      
      comics.available
      comics.collectionURI
      comics.items
      comics.returned
      description
      events.available
      events.collectionURI
      events.items
      events.returned
      id
      ...
      wiki.specieshistory
      wiki.team_name
      wiki.teamicon
      wiki.technology
      wiki.tie-ins
      wiki.title_graphic
      wiki.universe
      wiki.weapons
      wiki.weaponss
      wiki.weight
    
  
  
    
      0
       36
       http://gateway.marvel.com/v1/public/characters...
       [{'id': 36737, 'resourceURI': 'http://gateway....
       36
       AIM is a terrorist organization bent on destro...
       0
       http://gateway.marvel.com/v1/public/characters...
                                                      []
       0
       1009144
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
                                                     NaN
       NaN
                                              NaN
    
    
      0
       43
       http://gateway.marvel.com/v1/public/characters...
       [{'id': 34050, 'resourceURI': 'http://gateway....
       43
       Formerly known as Emil Blonsky, a spy of Sovie...
       2
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
       1009146
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
           Marvel Universe
                                                    None
       NaN
       (Abomination) 980 lbs.; (Blonsky) 180 lbs.
    
    
      0
       43
       http://gateway.marvel.com/v1/public/characters...
       [{'id': 36489, 'resourceURI': 'http://gateway....
       43
                                                        
       4
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       4
       1009148
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
       He uses a prison ball-and-chain as a weapon, a...
       NaN
                              365 lbs. (variable)
    
    
      0
        8
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
        8
                                                        
       1
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
       1009149
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
                                              Unrevealed
       NaN
                                       Unrevealed
    
    
      0
       20
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       20
                                                        
       0
       http://gateway.marvel.com/v1/public/characters...
                                                      []
       0
       1009150
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
       Agent Zero carries a wide array of weapons inc...
       NaN
                                         230 lbs.
    
  

5 rows × 89 columns



In [15]:

    
comics_df.tail()









    Out[15]:






  
    
      
      characters.available
      characters.collectionURI
      characters.items
      characters.returned
      collectedIssues
      collections
      creators.available
      creators.collectionURI
      creators.items
      creators.returned
      ...
      stories.items
      stories.returned
      textObjects
      thumbnail.extension
      thumbnail.path
      title
      upc
      urls
      variantDescription
      variants
    
  
  
    
      0
       0
       http://gateway.marvel.com/v1/public/comics/999...
                                                      []
       0
       []
       []
       0
       http://gateway.marvel.com/v1/public/comics/999...
                                                      []
       0
      ...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
       []
       jpg
       http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...
       Love Romances (1949) #94
       
       [{'type': 'detail', 'url': 'http://marvel.com/...
       
       []
    
    
      0
       0
       http://gateway.marvel.com/v1/public/comics/999...
                                                      []
       0
       []
       []
       1
       http://gateway.marvel.com/v1/public/comics/999...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
      ...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
       []
       jpg
       http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...
       Love Romances (1949) #96
       
       [{'type': 'detail', 'url': 'http://marvel.com/...
       
       []
    
    
      0
       0
       http://gateway.marvel.com/v1/public/comics/999...
                                                      []
       0
       []
       []
       2
       http://gateway.marvel.com/v1/public/comics/999...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
      ...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
       []
       jpg
       http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...
       Love Romances (1949) #97
       
       [{'type': 'detail', 'url': 'http://marvel.com/...
       
       []
    
    
      0
       0
       http://gateway.marvel.com/v1/public/comics/999...
                                                      []
       0
       []
       []
       2
       http://gateway.marvel.com/v1/public/comics/999...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
      ...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
       []
       jpg
       http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...
       Love Romances (1949) #99
       
       [{'type': 'detail', 'url': 'http://marvel.com/...
       
       []
    
    
      0
       3
       http://gateway.marvel.com/v1/public/comics/999...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       3
       []
       []
       8
       http://gateway.marvel.com/v1/public/comics/999...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       8
      ...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
       []
       jpg
       http://i.annihil.us/u/prod/marvel/i/mg/c/a0/4b...
          Magneto Rex (1999) #1
       
       [{'type': 'detail', 'url': 'http://marvel.com/...
       
       []
    
  

5 rows × 43 columns

Los DataFrames de pandas están implementados basandose en numpy, de modo que si queremos saber la longitud que tiene un Dataframe es exáctamente igual que en numpy, fácil ¿verdad?



In [16]:

    
characters_df.shape









    Out[16]:





(1402, 89)



In [17]:

    
comics_df.shape









    Out[17]:





(30179, 43)

Vamos a ver que podemos saber de los personajes



In [18]:

    
', '.join(characters_df.columns.values)









    Out[18]:





'comics.available, comics.collectionURI, comics.items, comics.returned, description, events.available, events.collectionURI, events.items, events.returned, id, modified, name, resourceURI, series.available, series.collectionURI, series.items, series.returned, stories.available, stories.collectionURI, stories.items, stories.returned, thumbnail.extension, thumbnail.path, urls, wiki.Date_of_birth, wiki.Place_of_birth, wiki.abilities, wiki.aliases, wiki.appearance, wiki.base_of_operations, wiki.bio, wiki.bio_text, wiki.blurb, wiki.builder, wiki.categories, wiki.categorytext, wiki.citizenship, wiki.creator, wiki.creators, wiki.current_members, wiki.debut, wiki.distinguishing_features, wiki.dstinguishing_features, wiki.education, wiki.event_text, wiki.eyes, wiki.features, wiki.former_members, wiki.govenment, wiki.government, wiki.groups, wiki.hair, wiki.height, wiki.home_world, wiki.identity, wiki.key_characters, wiki.key_issues, wiki.leader, wiki.location, wiki.main_image, wiki.members, wiki.object_text, wiki.occupation, wiki.origin, wiki.other_members, wiki.owner, wiki.paraphernalia, wiki.place_of_birth, wiki.place_of_creation, wiki.place_text, wiki.points_of_interest, wiki.power, wiki.powers, wiki.real_name, wiki.relatives, wiki.significant_citizens, wiki.significant_issues, wiki.skin, wiki.special_limitations, wiki.specieshistory, wiki.team_name, wiki.teamicon, wiki.technology, wiki.tie-ins, wiki.title_graphic, wiki.universe, wiki.weapons, wiki.weaponss, wiki.weight'

En realidad no deberíamos lanzar las campanas al vuelo porque spoiler muchos de los campos están vacios



In [19]:

    
characters_df.dropna()









    Out[19]:






  
    
      
      comics.available
      comics.collectionURI
      comics.items
      comics.returned
      description
      events.available
      events.collectionURI
      events.items
      events.returned
      id
      ...
      wiki.specieshistory
      wiki.team_name
      wiki.teamicon
      wiki.technology
      wiki.tie-ins
      wiki.title_graphic
      wiki.universe
      wiki.weapons
      wiki.weaponss
      wiki.weight
    
  
  
  

0 rows × 89 columns

¿Y qué pasa con los cómics?



In [20]:

    
comics_df.dropna().shape









    Out[20]:





(17516, 43)

Con una simple instrucción somos capaces de tratar con todos los nulos de un dataframe.

Stan Lee

Stanley Martin Lieber, más conocido como Stan Lee, nació el 28 de diciembre de 1922 en la ciudad de Nueva York. Es un guionista y editor de cómics estadounidense, creador de personajes notables por su complejidad y su realismo.

Es el cocreador, junto a dibujantes como Steve Ditko o Jack Kirby, de superhéroes como Los 4 Fantásticos, Spider-Man, Hulk, Iron Man, Thor, The Avengers, Daredevil, Doctor Strange, X-Men y muchos otros personajes, expandiendo Marvel Comics, llevándola de una pequeña casa publicitaria a una gran corporación multimedia. Todavía hoy, los cómics Marvel se distinguen por indicar siempre «Stan Lee presenta» en los rótulos de presentación. También tiene un programa en History Channel en donde busca super humanos reales. [Wikipedia]

Vamos a ver cuantos personajes ha creado. Y quien ostenta el top de creadores según la api de Marvel.

Series

Series es un array de 1 dimensión etiquetado. Como una tabla con una única columna. Puede almacenar cualquier tipo de datos:

Enteros
Cadenas
Números en coma flotante.
Objetos Python.
...

Se etiquetan en función del índice, si el índice que le pasamos son fechas se creará una instancie de TimeSerie, esta bien pensado, ¿verdad?

Cuando hacemos una selección de 1 columna en un Dataframe creamos una Serie.



In [21]:

    
#Stan Lee 
creators_serie = characters_df['wiki.creators'].dropna()
creators_serie.describe()









    Out[21]:





count                               119
unique                               37
top       this has not been updated yet
freq                                 44
dtype: object



In [22]:

    
#Renombramos la serie y el índice
creators_serie.name = 'Creadorers de personajes'
creators_serie.index.name = 'creators'

# Podemos usar head o como estamos sobre series también podemos coger una porción de la lista
# creators_serie.head()
creators_serie[:20]









    Out[22]:





creators
0            this has not been updated yet
0                                         
0                                         
0                                         
0                                         
0                                         
0                                         
0                  Peter David & Sam Keith
0               Bill Mantlo and Ed Hanigan
0                                         
0                 Stan Lee and Steve Ditko
0             Grant Morrison & Igor Kordey
0                          Chris Claremont
0                                         
0           Chris Claremont & Dave Cockrum
0            this has not been updated yet
0            this has not been updated yet
0                                         
0                     Stan Lee, Jack Kirby
0                           Grant Morrison
Name: Creadorers de personajes, dtype: object

Usando máscaras para extraer información



In [23]:

    
default_string = creators_serie != "this has not been updated yet"
default_string.head()
#creators_serie[ creators_serie != "this has not been updated yet" ]









    Out[23]:





creators
0           False
0            True
0            True
0            True
0            True
Name: Creadorers de personajes, dtype: bool



In [24]:

    
empty_string = creators_serie != ""
empty_string[:10]









    Out[24]:





creators
0            True
0           False
0           False
0           False
0           False
0           False
0           False
0            True
0            True
0           False
Name: Creadorers de personajes, dtype: bool



In [25]:

    
default_string and empty_string









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-544bf713079b> in <module>()
----> 1 default_string and empty_string

/Users/ada/Dev/.virtualenvs/marvel/lib/python3.3/site-packages/pandas/core/generic.py in __nonzero__(self)
    690         raise ValueError("The truth value of a {0} is ambiguous. "
    691                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 692                          .format(self.__class__.__name__))
    693 
    694     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

A pesar de que la palabra reservada and podríamos creer que funcionaría para unir series no funciona porque la operación no se aplica elemento a elemento. Pero pandas sabe que esto nos podría hacer falta y tenemos operadores que funcionan para elementos (& (and), | (or), ~(not))



In [26]:

    
creators_mask = default_string & empty_string
creators_mask[:10]









    Out[26]:





creators
0           False
0           False
0           False
0           False
0           False
0           False
0           False
0            True
0            True
0           False
Name: Creadorers de personajes, dtype: bool



In [27]:

    
creators_serie[creators_mask].head()









    Out[27]:





creators
0                Peter David & Sam Keith
0             Bill Mantlo and Ed Hanigan
0               Stan Lee and Steve Ditko
0           Grant Morrison & Igor Kordey
0                        Chris Claremont
Name: Creadorers de personajes, dtype: object

Aquí ya tenemos buena parte de la información que queremos, pero vamos a separar los autores que trabajan junto para poder contar cuantos personajes a creado cada uno.



In [28]:

    
import re
creators = [re.split('&|and|,', line) for line in creators_serie[creators_mask]]
clean_cretors =  pd.Series([c for creator in creators for c in creator])
clean_cretors.head()









    Out[28]:





0    Peter David 
1       Sam Keith
2    Bill Mantlo 
3      Ed Hanigan
4       Stan Lee 
dtype: object



In [29]:

    
clean_cretors.value_counts().head()









    Out[29]:





Chris Claremont     10
Stan Lee             7
 John Byrne          6
Chris Claremont      5
 Jack Kirby          4
dtype: int64

¡Vaya Stan Lee parece que Chris Claremont te gana!

Obviamente es un problema de falta de datos. Por eso debemos ser muy cuidadosos con la confianza que tenemos en nuestros resultados. Un corpus con errores nos llevará a conclusiones erróneas, hay que ser conscientes de esto.

Explorando a los superhéroes

Limpiando los datos: eliminar grupos

Marvel no distingue personajes de grupos de personajes. Es decir, "Los vengadores" es un personaje igual que podría serlo "Iron Man", pero tenemos un campo en la wiki que nos permite diferenciar grupos de personajes: "Former members". Así que vamos a quedarnos solo con los personajes.</br>

Lo normal es que quisieramos eliminar las filas que contienen nulos, y pandas tiene implementada una función para ello dropna, que ya hemos visto. Pero lo que queremos es quedarnos con aquellas filas en cuya columna current_members tengamos un nulo, porque si no hay miembros es porque es un personaje.



In [30]:

    
characters_df.dropna(subset=['wiki.current_members'])['name']









    Out[30]:





0                         A.I.M.
0                       Avengers
0    Brotherhood of Evil Mutants
0                         Exiles
0                 Fantastic Four
0                    Force Works
0                  Hellfire Club
0                          Hydra
0                 Imperial Guard
0                      Marauders
0                        Reavers
0                   S.H.I.E.L.D.
0                Serpent Society
0                        X-Force
0                          X-Men
...
0                         Sinister Six
0                          ClanDestine
0                            New X-Men
0                      Masters of Evil
0                         Generation X
0              Guardians of the Galaxy
0                               U-Foes
0                            Sentinels
0                          New Mutants
0             Lightning Lords of Nepal
0           Nine-Fold Daughters of Xao
0          Confederates of the Curious
0                             X-Babies
0                        Lethal Legion
0    Brotherhood of Mutants (Ultimate)
Name: name, Length: 70, dtype: object



In [31]:

    
%timeit (~characters_df['wiki.current_members'].isnull())

import numpy as np
%timeit (np.invert(characters_df['wiki.current_members'].isnull()))









    



1000 loops, best of 3: 218 µs per loop
1000 loops, best of 3: 233 µs per loop



In [32]:

    
not_groups_mask = characters_df['wiki.current_members'].isnull()
not_groups_mask.head()









    Out[32]:





0    False
0     True
0     True
0     True
0     True
Name: wiki.current_members, dtype: bool



In [33]:

    
characters_df=characters_df[not_groups_mask]



In [34]:

    
characters_df[:3]









    Out[34]:






  
    
      
      comics.available
      comics.collectionURI
      comics.items
      comics.returned
      description
      events.available
      events.collectionURI
      events.items
      events.returned
      id
      ...
      wiki.specieshistory
      wiki.team_name
      wiki.teamicon
      wiki.technology
      wiki.tie-ins
      wiki.title_graphic
      wiki.universe
      wiki.weapons
      wiki.weaponss
      wiki.weight
    
  
  
    
      0
       43
       http://gateway.marvel.com/v1/public/characters...
       [{'id': 34050, 'resourceURI': 'http://gateway....
       43
       Formerly known as Emil Blonsky, a spy of Sovie...
       2
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       2
       1009146
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
           Marvel Universe
                                                    None
       NaN
       (Abomination) 980 lbs.; (Blonsky) 180 lbs.
    
    
      0
       43
       http://gateway.marvel.com/v1/public/characters...
       [{'id': 36489, 'resourceURI': 'http://gateway....
       43
                                                        
       4
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       4
       1009148
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
       He uses a prison ball-and-chain as a weapon, a...
       NaN
                              365 lbs. (variable)
    
    
      0
        8
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
        8
                                                        
       1
       http://gateway.marvel.com/v1/public/characters...
       [{'resourceURI': 'http://gateway.marvel.com/v1...
       1
       1009149
      ...
       NaN
       NaN
       NaN
       NaN
       NaN
       NaN
       [[Marvel Universe]]
                                              Unrevealed
       NaN
                                       Unrevealed
    
  

3 rows × 89 columns

Vamos a limpiar lo datos, quedarnos con los campos que nos puedan ser útililes y indexar el dataframe usando el nombre del superhéroe o de la superheroína, porque pandas ha hecho lo que ha podido pero los números no son muy intuitivos.



In [35]:

    
# Agrupamos los datos para tener claro con que queremos trabajar
physical_data = ['wiki.hair', 'wiki.weight', 'wiki.height', 'wiki.eyes']
cultural_data = ['wiki.education', 'wiki.citizenship', 'wiki.place_of_birth', 'wiki.occupation']
personal_data = ['wiki.bio', 'wiki.bio_text', 'wiki.categories']

data_keys = (physical_data + cultural_data + personal_data + ['name','comics.available'])

¿Os acordáis de dropna()? Pues puede hacer mucho más.



In [36]:

    
clean_df = characters_df.dropna(subset = data_keys)
clean_df = clean_df[data_keys].set_index('name')
clean_df.shape









    Out[36]:





(762, 12)

Representación racial, cultural y de género en los cómics de Marvel

Por ejemplo, sería muy interesante saber cuantas razas están representadas en los cómics de Marvel, y existe un campo skin en la wiki, pero...



In [37]:

    
characters_df['wiki.skin'].dropna()









    Out[37]:





0    White (as GAmbit), Black (as Death)
Name: wiki.skin, dtype: object

Pero vamos a explorar lo que tenemos.



In [38]:

    
clean_df[personal_data].head()









    Out[38]:






  
    
      
      wiki.bio
      wiki.bio_text
      wiki.categories
    
    
      name
      
      
      
    
  
  
    
      Abomination (Emil Blonsky)
       Formerly known as Emil Blonsky, a spy of Sovie...
       Formerly known as Emil Blonsky, a spy of Sovie...
       [Avengers, Deceased, Hulk, International, Vill...
    
    
      Absorbing Man
       Crusher Creel's life was little more than that...
       Crusher Creel's life was little more than that...
                         [Avengers, Civil War, Villains]
    
    
      Abyss
       Sealed in a coffin-like prison, Abyss was take...
       Sealed in a coffin-like prison, Abyss was take...
                               [Cosmic, Magic, Villains]
    
    
      Agent Zero
       Born in the former East Germany, Christoph Nor...
       Born in the former East Germany, Christoph Nor...
       [Heroes, X-Men, Villains, International, Mutants]
    
    
      Annihilus
       Untold millennia ago, the Tyannans, a technolo...
       Untold millennia ago, the Tyannans, a technolo...
        [Annihilation, Cosmic, Fantastic Four, Villains]



In [39]:

    
clean_df[cultural_data].head()









    Out[39]:






  
    
      
      wiki.education
      wiki.citizenship
      wiki.place_of_birth
      wiki.occupation
    
    
      name
      
      
      
      
    
  
  
    
      Abomination (Emil Blonsky)
                Unrevealed
       Citizen of Croatia; former citizen of Yugoslavia
                                      Zagreb, Yugoslavia
                       Professional Criminal, Former Spy
    
    
      Absorbing Man
       High school dropout
                          U.S.A. with a criminal record
                                 New York City, New York
                     Professional criminal; former boxer
    
    
      Abyss
                Unrevealed
                                             Unrevealed
                                              Unrevealed
                                         Cosmic sorcerer
    
    
      Agent Zero
                Unrevealed
                                                 German
              Unrevealed location in former East Germany
       Mercenary, former government operative, freedo...
    
    
      Annihilus
                Unrevealed
                                                Arthros
       Planet of [[Arthros]], Sector 17A, [[Negative ...
                                    Conqueror, scavenger



In [40]:

    
clean_df[cultural_data].describe()









    Out[40]:






  
    
      
      wiki.education
      wiki.citizenship
      wiki.place_of_birth
      wiki.occupation
    
  
  
    
      count
              762
          762
              762
              762
    
    
      unique
              357
          262
              412
              636
    
    
      top
       Unrevealed
       U.S.A.
       Unrevealed
       Adventurer
    
    
      freq
              236
          230
              156
               31



In [41]:

    
clean_df[physical_data].head()









    Out[41]:






  
    
      
      wiki.hair
      wiki.weight
      wiki.height
      wiki.eyes
    
    
      name
      
      
      
      
    
  
  
    
      Abomination (Emil Blonsky)
         (Abomination) None; (Blonsky) Blond
       (Abomination) 980 lbs.; (Blonsky) 180 lbs.
       (Abomination) 6'8"; (Blonsky) 5'10"
       (Abomination) Green; (Blonsky) Blue
    
    
      Absorbing Man
                                        Bald
                              365 lbs. (variable)
                           6'4" (variable)
                                      Blue
    
    
      Abyss
                                  Unrevealed
                                       Unrevealed
                                Unrevealed
                                Unrevealed
    
    
      Agent Zero
       (Originally) Brown; (currently) Black
                                         230 lbs.
                                      6'3"
                                      Blue
    
    
      Annihilus
                                        None
                                         200 lbs.
                                     5'11"
                                     Green

¿Cómo diriáis que es físicamente el personaje típico de la marvel? (pandas lo sabe)



In [42]:

    
clean_df[physical_data].describe()









    Out[42]:






  
    
      
      wiki.hair
      wiki.weight
      wiki.height
      wiki.eyes
    
  
  
    
      count
         762
              762
              762
        762
    
    
      unique
         223
              307
              213
        165
    
    
      top
       Black
       Unrevealed
       Unrevealed
       Blue
    
    
      freq
         165
               48
               44
        236

De modo que el personaje arquetípico de la Marvel tiene el pelo negro y los ojos azules, es de EE.UU. se dedica a ser aventurero. A mí la profesión ya me gusta.

¿Qué personaje aparece en más cómics?



In [43]:

    
clean_df['comics.available'].describe()









    Out[43]:





count     762.000000
mean       53.292651
std       179.820372
min         0.000000
25%         2.000000
50%        10.000000
75%        33.750000
max      2575.000000
dtype: float64

¿2575.000000? Debe ser un error, ¿no? ¿Quién es el pluriempleado?



In [44]:

    
clean_df[clean_df['comics.available'] == 2575.000000]









    Out[44]:






  
    
      
      wiki.hair
      wiki.weight
      wiki.height
      wiki.eyes
      wiki.education
      wiki.citizenship
      wiki.place_of_birth
      wiki.occupation
      wiki.bio
      wiki.bio_text
      wiki.categories
      comics.available
    
    
      name
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Spider-Man
       Brown
       167 lbs.
       5'10"
       Hazel
       College graduate (biophysics major), doctorate...
       U.S.A.
       Forest Hills, New York
       Scientist and inventor; former freelance photo...
       The bite of an irradiated spider granted high-...
       The bite of an irradiated spider granted high-...
       [Avengers, Civil War, Heroes, Marvel Knights, ...
       2575

No se si es un error, pero sino lo es el llorón de spiderman aparece en muchos cómics.

Distribución de héroes y villanos en función de género

Antes de ponernos a jugar con los datos (más), tenemos una columna de la que se pude sacar mucho partido "wiki.categories"



In [45]:

    
clean_df.iloc[1]









    Out[45]:





wiki.hair                                                           Bald
wiki.weight                                          365 lbs. (variable)
wiki.height                                              6'4" (variable)
wiki.eyes                                                           Blue
wiki.education                                       High school dropout
wiki.citizenship                           U.S.A. with a criminal record
wiki.place_of_birth                              New York City, New York
wiki.occupation                      Professional criminal; former boxer
wiki.bio               Crusher Creel's life was little more than that...
wiki.bio_text          Crusher Creel's life was little more than that...
wiki.categories                          [Avengers, Civil War, Villains]
comics.available                                                      43
Name: Absorbing Man, dtype: object

A priori no tenemos información de que personajes son hombres, mujeres o alienigenas. Pero Marvel debió intuir que nos podría interesar el papel de las mujeres en los cómics y nos incluyo una categoría: "Mujeres", que nos va a facilitar la vida un montón. Vamos a crear dos nuevas columnas en el dataframe:

woman: que simplemente contendrá True o False si el personaje es femenino o no respectivamente.
villan: ídem T/F si el personaje es villano o no.



In [46]:

    
women = clean_df['wiki.categories'].map(lambda x: 'Women' in x)
clean_df['Women'] = women 
women[:5]









    Out[46]:





name
Abomination (Emil Blonsky)    False
Absorbing Man                 False
Abyss                         False
Agent Zero                    False
Annihilus                     False
Name: wiki.categories, dtype: bool



In [47]:

    
# ~ Esto es una negación element-wise
print("Mujeres: #{}, hombres #{}".format(clean_df[women].shape[0],clean_df[~women].shape[0]))









    



Mujeres: #199, hombres #563

Es decir, tenemos 199 personajes femeninos y 563 masculinos. Es decir solo el 26% de los personajes son femeninos.



In [48]:

    
villan = clean_df['wiki.categories'].map(lambda x: 'Villains' in x)
clean_df['Villan'] = villan



In [49]:

    
print("Villanos: #{}, Héroes #{}".format(clean_df[villan].shape[0],clean_df[~villan].shape[0]))









    



Villanos: #231, Héroes #531

Los villanos también tienen mucho trabajo porque al parecer son sólo el 30'31% de los personajes.

Vamos a ver cómo se distribuyen hombres y mujeres los roles de héroes y villanos.



In [50]:

    
men = ~women
gender_data = {'Women':{'Heroes':0,'Villans':0},'Men':{'Heroes':0,'Villans':0}}
# Women and villans
gender_data['Women']['Villans'] = clean_df[villan & women].shape[0]
# Women and heroes
gender_data['Women']['Heroes'] = clean_df[~villan & women].shape[0]

# Men and villans
gender_data['Men']['Villans'] = clean_df[villan & men].shape[0]
# Men and heroes
gender_data['Men']['Heroes'] = clean_df[~villan & men].shape[0]
gender_data









    Out[50]:





{'Women': {'Villans': 30, 'Heroes': 169},
 'Men': {'Villans': 201, 'Heroes': 362}}



In [51]:

    
n_groups = 2
opacity = 0.3
men_data = (gender_data['Men']['Villans'], gender_data['Men']['Heroes'])
women_data = (gender_data['Women']['Villans'], gender_data['Women']['Heroes'])

fig, ax = plt.subplots()

index = np.arange(n_groups)
bar_width = 0.4


rects1 = plt.bar(index, men_data, bar_width,
                 alpha=opacity,
                 color='b',
                 label='Hombres')

rects2 = plt.bar(index + bar_width, women_data, bar_width,
                 alpha=opacity,
                 color='r',
                 label='Mujeres')

plt.xlabel('Rol')
plt.ylabel('Número de personajes')
plt.title('Distribución por género y roles')
plt.xticks(index + bar_width, ('Héroes', 'Villanos'))
plt.legend(loc=0, borderaxespad=1.)

plt.show()

Explorando los cómics



In [52]:

    
comics_df.dtypes









    Out[52]:





characters.available          int64
characters.collectionURI     object
characters.items             object
characters.returned           int64
collectedIssues              object
collections                  object
creators.available            int64
creators.collectionURI       object
creators.items               object
creators.returned             int64
dates                        object
description                  object
diamondCode                  object
digitalId                     int64
ean                          object
events.available              int64
events.collectionURI         object
events.items                 object
events.returned               int64
format                       object
id                            int64
images                       object
isbn                         object
issn                         object
issueNumber                 float64
modified                     object
pageCount                     int64
prices                       object
resourceURI                  object
series.name                  object
series.resourceURI           object
stories.available             int64
stories.collectionURI        object
stories.items                object
stories.returned              int64
textObjects                  object
thumbnail.extension          object
thumbnail.path               object
title                        object
upc                          object
urls                         object
variantDescription           object
variants                     object
dtype: object

En el campo precio aun tenemos un objeto json. ¡Mal! Así no podemos analizarlo.

El tipo objeto en dtype proviene de numpy y describe un elemento de un ndarray. Cada elemento deben ser del mismo tamaño en bytes. Para un int64 y un float64 necesitamos 8 bytes, pero para una cadena la longitud total no está prefijada y lo que almacena Pandas es un puntero.

¡Pero no pasa nada! Lo que vamos a hacer es convertirlo a una serie, quedarnos únicamente con el precio impreso y arreglar esta columna del dataframe.



In [53]:

    
prices = comics_df.prices



In [54]:

    
prices_serie = prices.apply(pd.Series)



In [55]:

    
prices_serie[20:30]









    Out[55]:






  
    
      
      0
      1
    
  
  
    
      0
        {'price': 1.5, 'type': 'printPrice'}
                                                   NaN
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
                                                   NaN
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
                                                   NaN
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
       {'price': 0.99, 'type': 'digitalPurchasePrice'}
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
       {'price': 0.99, 'type': 'digitalPurchasePrice'}
    
    
      0
       {'price': 1.25, 'type': 'printPrice'}
                                                   NaN
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
       {'price': 0.99, 'type': 'digitalPurchasePrice'}
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
       {'price': 0.99, 'type': 'digitalPurchasePrice'}
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
                                                   NaN
    
    
      0
        {'price': 1.5, 'type': 'printPrice'}
                                                   NaN



In [60]:

    
print_price = prices_serie[0].apply(pd.Series)['price']



In [61]:

    
digital_price = prices_serie[1].apply(pd.Series).price



In [62]:

    
digital_price.value_counts()









    Out[62]:





1.99     7040
3.99      137
2.99       88
0.99       44
0.00       44
7.99        3
6.99        3
4.99        3
19.99       1
dtype: int64



In [63]:

    
digital_price.count()









    Out[63]:





7363

Sólo el 24'4% se ha editado digitalmente.

Eliminamos la columna sucia y añadimos los datos limpios.



In [64]:

    
#del también funcionaria del df.column_name
comics_df = comics_df.drop('price')



In [65]:

    
comics_df['print price'] = print_price
comics_df['digital price'] = digital_price

A las fechas les pasa exáctamente lo mismo que a los precios. Vamos a limpiar los datos (data munging again)



In [66]:

    
dates = comics_df.dates
dates_serie = dates.apply(pd.Series)[0].apply(pd.Series)



In [67]:

    
on_sale_date = dates_serie.date.astype('datetime64[ns]')
on_sale_date.head()









    Out[67]:





0   2004-11-24 05:00:00
0   2003-10-08 04:00:00
0   2005-11-02 05:00:00
0   1999-06-01 04:00:00
0   1999-07-01 04:00:00
Name: date, dtype: datetime64[ns]



In [68]:

    
comics_df['On sale Date'] = on_sale_date



In [69]:

    
start = comics_df['On sale Date'].min()
end =  comics_df['On sale Date'].max()

yearly_range = pd.date_range(start, end, freq='365D6H')



In [70]:

    
comics_per_year = comics_df.groupby(on_sale_date.map(lambda x: x.year)).size()
comics_per_year.plot()









    Out[70]:





<matplotlib.axes._subplots.AxesSubplot at 0x10c742b10>



In [71]:

    
really_old = comics_df[on_sale_date==start]
print(start)









    



1753-07-29 03:43:41.128654848

WTF! La Marvel es muuuucho más antigua de lo que nosostros/as pensabamos.



In [72]:

    
really_old.dates.iloc[1]









    Out[72]:





[{'date': '-0001-11-30T00:00:00-0500', 'type': 'onsaleDate'},
 {'date': '-0001-11-30T00:00:00-0500', 'type': 'focDate'}]

O es un problema de formato. En cualquier caso no queremos esos datos, son ruido.



In [73]:

    
#back_to_future_comics = comics_df[on_sale_date==end]
print(end)
back_to_future_comic = comics_df[comics_df['On sale Date'] == end]
back_to_future_comic.title









    



2020-12-31 05:00:00






    Out[73]:





0    Ant-Man: So (Trade Paperback)
Name: title, dtype: object



In [74]:

    
print("Vamos a eliminar {} ficheros.".format(really_old['On sale Date'].shape[0]))









    



Vamos a eliminar 945 ficheros.



In [75]:

    
comics_df = comics_df[comics_df['On sale Date'] != start]



In [76]:

    
comics_per_year = comics_df.groupby(comics_df['On sale Date'].map(lambda x: x.year)).size()
comics_per_year.plot()









    Out[76]:





<matplotlib.axes._subplots.AxesSubplot at 0x10d57e650>

Muuucho mejor.



In [77]:

    
comics_df = comics_df.fillna(0)

¿Nos acordamos del dropna? Pues tambíen tenemos un fillna



In [78]:

    
comics_group = comics_df.groupby(comics_df['On sale Date'].map(lambda x: x.year))



In [79]:

    
price_per_year = comics_group['print price', 'pageCount', 'digital price'].mean()



In [80]:

    
price_per_year









    Out[80]:






  
    
      
      print price
      pageCount
      digital price
    
    
      On sale Date
      
      
      
    
  
  
    
      1939
        0.100000
        68.000000
       0.000000
    
    
      1940
        0.100000
        68.000000
       0.000000
    
    
      1941
        0.100000
        68.000000
       0.361818
    
    
      1942
        0.100000
        68.000000
       0.000000
    
    
      1943
        0.092308
        59.282051
       0.000000
    
    
      1944
        0.090909
        51.636364
       0.000000
    
    
      1945
        0.088889
        37.481481
       0.000000
    
    
      1946
        0.089655
        46.620690
       0.000000
    
    
      1947
        0.081250
        40.250000
       0.000000
    
    
      1948
        0.056000
        26.200000
       0.000000
    
    
      1949
        0.050000
        16.000000
       0.000000
    
    
      1950
        0.100000
        41.333333
       0.000000
    
    
      1951
        0.090909
        32.727273
       0.000000
    
    
      1952
        0.078947
        28.421053
       0.000000
    
    
      1953
        0.080000
        27.600000
       0.000000
    
    
      1954
        0.065714
        26.742857
       0.000000
    
    
      1955
        0.070968
        26.709677
       0.000000
    
    
      1956
        0.075000
        26.100000
       0.000000
    
    
      1957
        0.078261
        28.173913
       0.000000
    
    
      1958
        0.053333
        26.400000
       0.000000
    
    
      1959
        0.073913
        15.739130
       0.000000
    
    
      1960
        0.070000
        20.080000
       0.019800
    
    
      1961
        0.084433
         8.860825
       0.082062
    
    
      1962
        0.105114
        20.545455
       0.361818
    
    
      1963
        0.090636
        19.309091
       0.814091
    
    
      1964
        0.096439
        25.606061
       0.964848
    
    
      1965
        0.094173
        27.683453
       1.030791
    
    
      1966
        0.099161
        28.867133
       0.827972
    
    
      1967
        0.100811
        27.885135
       0.510541
    
    
      1968
        0.107081
        28.118012
       0.401553
    
    
      ...
      ...
      ...
      ...
    
    
      1986
        0.575697
        28.701195
       0.356773
    
    
      1987
        0.573469
        26.800000
       0.203061
    
    
      1988
        0.770741
        30.162963
       0.243222
    
    
      1989
        0.921711
        33.800000
       0.130921
    
    
      1990
        0.812363
        29.879121
       0.183132
    
    
      1991
        0.809682
        29.862069
       0.187215
    
    
      1992
        0.871466
        27.570681
       0.197906
    
    
      1993
        1.074494
        29.283117
       0.320468
    
    
      1994
        1.154888
        27.363128
       0.188994
    
    
      1995
        1.290229
        25.124183
       0.312190
    
    
      1996
        1.108352
        25.260536
       0.152490
    
    
      1997
        1.127034
        27.027586
       0.466621
    
    
      1998
        0.901168
        28.963504
       0.275985
    
    
      1999
        2.720977
        68.877193
       0.403985
    
    
      2000
        0.697008
        25.819672
       0.481189
    
    
      2001
        0.415556
        20.518519
       0.853735
    
    
      2002
        0.909750
        27.633333
       0.784944
    
    
      2003
        3.073240
        42.234676
       0.616865
    
    
      2004
        3.908775
        16.430834
       0.615702
    
    
      2005
        4.454624
         8.347863
       0.675239
    
    
      2006
        4.576349
         0.092105
       0.713520
    
    
      2007
        4.660382
         0.000000
       0.657806
    
    
      2008
        7.574411
        50.649746
       0.585883
    
    
      2009
        8.891997
        77.159628
       0.505063
    
    
      2010
        8.374445
        74.563660
       0.553129
    
    
      2011
        9.318335
        82.327663
       0.653859
    
    
      2012
        9.536820
        81.411090
       0.912588
    
    
      2013
        9.095204
        77.181307
       0.755102
    
    
      2014
        8.369410
        66.957640
       0.006036
    
    
      2020
       19.990000
       136.000000
       0.000000
    
  

77 rows × 3 columns



In [81]:

    
price_per_year.plot()









    Out[81]:





<matplotlib.axes._subplots.AxesSubplot at 0x10cf0a9d0>



In [82]:

    
plt.figure()
with pd.plot_params.use('x_compat', True):
    price_per_year['print price'].plot(color='r')
    price_per_year['digital price'].plot(color='g')

¡Gracias!



In [ ]:

comics.available	comics.collectionURI	comics.items	comics.returned	description	events.available	events.collectionURI	events.items	events.returned	id	...	wiki.specieshistory	wiki.team_name	wiki.teamicon	wiki.technology	wiki.tie-ins	wiki.title_graphic	wiki.universe	wiki.weapons	wiki.weaponss	wiki.weight
36	http://gateway.marvel.com/v1/public/characters...	[{'id': 36737, 'resourceURI': 'http://gateway....	36	AIM is a terrorist organization bent on destro...	0	http://gateway.marvel.com/v1/public/characters...	[]	0	1009144	...	NaN	NaN	NaN	NaN	NaN	NaN	[[Marvel Universe]]	NaN	NaN	NaN
43	http://gateway.marvel.com/v1/public/characters...	[{'id': 34050, 'resourceURI': 'http://gateway....	43	Formerly known as Emil Blonsky, a spy of Sovie...	2	http://gateway.marvel.com/v1/public/characters...	[{'resourceURI': 'http://gateway.marvel.com/v1...	2	1009146	...	NaN	NaN	NaN	NaN	NaN	NaN	Marvel Universe	None	NaN	(Abomination) 980 lbs.; (Blonsky) 180 lbs.
43	http://gateway.marvel.com/v1/public/characters...	[{'id': 36489, 'resourceURI': 'http://gateway....	43		4	http://gateway.marvel.com/v1/public/characters...	[{'resourceURI': 'http://gateway.marvel.com/v1...	4	1009148	...	NaN	NaN	NaN	NaN	NaN	NaN	[[Marvel Universe]]	He uses a prison ball-and-chain as a weapon, a...	NaN	365 lbs. (variable)
8	http://gateway.marvel.com/v1/public/characters...	[{'resourceURI': 'http://gateway.marvel.com/v1...	8		1	http://gateway.marvel.com/v1/public/characters...	[{'resourceURI': 'http://gateway.marvel.com/v1...	1	1009149	...	NaN	NaN	NaN	NaN	NaN	NaN	[[Marvel Universe]]	Unrevealed	NaN	Unrevealed
20	http://gateway.marvel.com/v1/public/characters...	[{'resourceURI': 'http://gateway.marvel.com/v1...	20		0	http://gateway.marvel.com/v1/public/characters...	[]	0	1009150	...	NaN	NaN	NaN	NaN	NaN	NaN	[[Marvel Universe]]	Agent Zero carries a wide array of weapons inc...	NaN	230 lbs.

characters.available	characters.collectionURI	characters.items	characters.returned	collectedIssues	collections	creators.available	creators.collectionURI	creators.items	creators.returned	...	stories.items	stories.returned	textObjects	thumbnail.extension	thumbnail.path	title	urls	variants
0	http://gateway.marvel.com/v1/public/comics/999...	[]	0	[]	[]	0	http://gateway.marvel.com/v1/public/comics/999...	[]	0	...	[{'resourceURI': 'http://gateway.marvel.com/v1...	2	[]	jpg	http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...	Love Romances (1949) #94	[{'type': 'detail', 'url': 'http://marvel.com/...	[]
0	http://gateway.marvel.com/v1/public/comics/999...	[]	0	[]	[]	1	http://gateway.marvel.com/v1/public/comics/999...	[{'resourceURI': 'http://gateway.marvel.com/v1...	1	...	[{'resourceURI': 'http://gateway.marvel.com/v1...	1	[]	jpg	http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...	Love Romances (1949) #96	[{'type': 'detail', 'url': 'http://marvel.com/...	[]
0	http://gateway.marvel.com/v1/public/comics/999...	[]	0	[]	[]	2	http://gateway.marvel.com/v1/public/comics/999...	[{'resourceURI': 'http://gateway.marvel.com/v1...	2	...	[{'resourceURI': 'http://gateway.marvel.com/v1...	1	[]	jpg	http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...	Love Romances (1949) #97	[{'type': 'detail', 'url': 'http://marvel.com/...	[]
0	http://gateway.marvel.com/v1/public/comics/999...	[]	0	[]	[]	2	http://gateway.marvel.com/v1/public/comics/999...	[{'resourceURI': 'http://gateway.marvel.com/v1...	2	...	[{'resourceURI': 'http://gateway.marvel.com/v1...	1	[]	jpg	http://i.annihil.us/u/prod/marvel/i/mg/b/40/im...	Love Romances (1949) #99	[{'type': 'detail', 'url': 'http://marvel.com/...	[]
3	http://gateway.marvel.com/v1/public/comics/999...	[{'resourceURI': 'http://gateway.marvel.com/v1...	3	[]	[]	8	http://gateway.marvel.com/v1/public/comics/999...	[{'resourceURI': 'http://gateway.marvel.com/v1...	8	...	[{'resourceURI': 'http://gateway.marvel.com/v1...	2	[]	jpg	http://i.annihil.us/u/prod/marvel/i/mg/c/a0/4b...	Magneto Rex (1999) #1	[{'type': 'detail', 'url': 'http://marvel.com/...	[]

	wiki.bio	wiki.bio_text	wiki.categories
name
Abomination (Emil Blonsky)	Formerly known as Emil Blonsky, a spy of Sovie...	Formerly known as Emil Blonsky, a spy of Sovie...	[Avengers, Deceased, Hulk, International, Vill...
Absorbing Man	Crusher Creel's life was little more than that...	Crusher Creel's life was little more than that...	[Avengers, Civil War, Villains]
Abyss	Sealed in a coffin-like prison, Abyss was take...	Sealed in a coffin-like prison, Abyss was take...	[Cosmic, Magic, Villains]
Agent Zero	Born in the former East Germany, Christoph Nor...	Born in the former East Germany, Christoph Nor...	[Heroes, X-Men, Villains, International, Mutants]
Annihilus	Untold millennia ago, the Tyannans, a technolo...	Untold millennia ago, the Tyannans, a technolo...	[Annihilation, Cosmic, Fantastic Four, Villains]

	wiki.education	wiki.citizenship	wiki.place_of_birth	wiki.occupation
count	762	762	762	762
unique	357	262	412	636
top	Unrevealed	U.S.A.	Unrevealed	Adventurer
freq	236	230	156	31

	wiki.hair	wiki.weight	wiki.height	wiki.eyes
name
Abomination (Emil Blonsky)	(Abomination) None; (Blonsky) Blond	(Abomination) 980 lbs.; (Blonsky) 180 lbs.	(Abomination) 6'8"; (Blonsky) 5'10"	(Abomination) Green; (Blonsky) Blue
Absorbing Man	Bald	365 lbs. (variable)	6'4" (variable)	Blue
Abyss	Unrevealed	Unrevealed	Unrevealed	Unrevealed
Agent Zero	(Originally) Brown; (currently) Black	230 lbs.	6'3"	Blue
Annihilus	None	200 lbs.	5'11"	Green

	wiki.hair	wiki.weight	wiki.height	wiki.eyes
count	762	762	762	762
unique	223	307	213	165
top	Black	Unrevealed	Unrevealed	Blue
freq	165	48	44	236

	wiki.hair	wiki.weight	wiki.height	wiki.eyes	wiki.education	wiki.citizenship	wiki.place_of_birth	wiki.occupation	wiki.bio	wiki.bio_text	wiki.categories	comics.available
name
Spider-Man	Brown	167 lbs.	5'10"	Hazel	College graduate (biophysics major), doctorate...	U.S.A.	Forest Hills, New York	Scientist and inventor; former freelance photo...	The bite of an irradiated spider granted high-...	The bite of an irradiated spider granted high-...	[Avengers, Civil War, Heroes, Marvel Knights, ...	2575

	0	1
0	{'price': 1.5, 'type': 'printPrice'}	NaN
0	{'price': 1.5, 'type': 'printPrice'}	NaN
0	{'price': 1.5, 'type': 'printPrice'}	NaN
0	{'price': 1.5, 'type': 'printPrice'}	{'price': 0.99, 'type': 'digitalPurchasePrice'}
0	{'price': 1.5, 'type': 'printPrice'}	{'price': 0.99, 'type': 'digitalPurchasePrice'}
0	{'price': 1.25, 'type': 'printPrice'}	NaN
0	{'price': 1.5, 'type': 'printPrice'}	{'price': 0.99, 'type': 'digitalPurchasePrice'}
0	{'price': 1.5, 'type': 'printPrice'}	{'price': 0.99, 'type': 'digitalPurchasePrice'}
0	{'price': 1.5, 'type': 'printPrice'}	NaN
0	{'price': 1.5, 'type': 'printPrice'}	NaN

	print price	pageCount	digital price
On sale Date
1939	0.100000	68.000000	0.000000
1940	0.100000	68.000000	0.000000
1941	0.100000	68.000000	0.361818
1942	0.100000	68.000000	0.000000
1943	0.092308	59.282051	0.000000
1944	0.090909	51.636364	0.000000
1945	0.088889	37.481481	0.000000
1946	0.089655	46.620690	0.000000
1947	0.081250	40.250000	0.000000
1948	0.056000	26.200000	0.000000
1949	0.050000	16.000000	0.000000
1950	0.100000	41.333333	0.000000
1951	0.090909	32.727273	0.000000
1952	0.078947	28.421053	0.000000
1953	0.080000	27.600000	0.000000
1954	0.065714	26.742857	0.000000
1955	0.070968	26.709677	0.000000
1956	0.075000	26.100000	0.000000
1957	0.078261	28.173913	0.000000
1958	0.053333	26.400000	0.000000
1959	0.073913	15.739130	0.000000
1960	0.070000	20.080000	0.019800
1961	0.084433	8.860825	0.082062
1962	0.105114	20.545455	0.361818
1963	0.090636	19.309091	0.814091
1964	0.096439	25.606061	0.964848
1965	0.094173	27.683453	1.030791
1966	0.099161	28.867133	0.827972
1967	0.100811	27.885135	0.510541
1968	0.107081	28.118012	0.401553
...	...	...	...
1986	0.575697	28.701195	0.356773
1987	0.573469	26.800000	0.203061
1988	0.770741	30.162963	0.243222
1989	0.921711	33.800000	0.130921
1990	0.812363	29.879121	0.183132
1991	0.809682	29.862069	0.187215
1992	0.871466	27.570681	0.197906
1993	1.074494	29.283117	0.320468
1994	1.154888	27.363128	0.188994
1995	1.290229	25.124183	0.312190
1996	1.108352	25.260536	0.152490
1997	1.127034	27.027586	0.466621
1998	0.901168	28.963504	0.275985
1999	2.720977	68.877193	0.403985
2000	0.697008	25.819672	0.481189
2001	0.415556	20.518519	0.853735
2002	0.909750	27.633333	0.784944
2003	3.073240	42.234676	0.616865
2004	3.908775	16.430834	0.615702
2005	4.454624	8.347863	0.675239
2006	4.576349	0.092105	0.713520
2007	4.660382	0.000000	0.657806
2008	7.574411	50.649746	0.585883
2009	8.891997	77.159628	0.505063
2010	8.374445	74.563660	0.553129
2011	9.318335	82.327663	0.653859
2012	9.536820	81.411090	0.912588
2013	9.095204	77.181307	0.755102
2014	8.369410	66.957640	0.006036
2020	19.990000	136.000000	0.000000